A novel adaptive-weight ensemble surrogate model base on distance and mixture error

Surrogate models are commonly used as a substitute for the computation-intensive simulations in design optimization. However, building a high-accuracy surrogate model with limited samples remains a challenging task. In this paper, a novel adaptive-weight ensemble surrogate modeling method is proposed to address this challenge. Instead of using a single error metric, the proposed method takes into account the position of the prediction sample, the mixture error metric and the learning characteristics of the component surrogate models. The effectiveness of proposed ensemble models are tested on five highly nonlinear benchmark functions and a finite element model for the analysis of the frequency response of an automotive exhaust pipe. Comparative results demonstrate the effectiveness and promising potential of proposed method in achieving higher accuracy.


Introduction
Computational simulation models have been widely used to as a substitute for costly and timeconsuming physical experiments [1].In spite of sustained growths in computer capability and speed, the enormous computational cost still makes it impractical to rely solely on highly expensive simulation models for design and optimization [2].Surrogate modeling, as a costeffective alternative to expensive simulation models, is a special case of supervised machine learning applied in the field of engineering design [3][4][5][6].Over the past few decades, there is a growing interest in developing surrogate modeling techniques in many different engineering disciplines [7][8][9][10].There are a lot of commonly used metamodels, include Polynomial Regression models [11], Support Vector Regression models [12,13], Radial Basis Function interpolation models [14,15], Kriging/Gaussian Process models [16][17][18], Artificial Neural Network models [19,20], et al.
A more detailed overview on various surrogate modeling techniques can be found in Refs.[21,22].Han [23] et al. conducted comparative analysis of different surrogate modeling methods.Prior studies have shown that ANN is better suited for ultra-high-dimensional problems with over 1000 variables.RBF is suitable for approximating high-order nonlinear functions, while KRG is suitable for fitting functions with high dimensionality and low nonlinearity.PRS exhibits good approximation capabilities for low-dimensional, low nonlinearity functions, or functions with data noise.When training samples are limited, SVR demonstrates better fitting performance and is more suitable for handling low-dimensional problems.Normally, there is no single surrogate model that can provide the best prediction accuracy for every scenario.
To ease this problem, the ensemble surrogate model, constructed by combining multiple individual surrogate models, have been widespread concerned [24,25].The ensemble surrogate model can leverage the advantages of each individual model at a lower computational cost and extract trend information from the design space.In other words, it would be beneficial to combine multiple surrogate models to enhance the accuracy of the response predictions [26].There is a growing interest in developing ensemble surrogate models.For instance, based on the principles of neural network modeling, Zerpa et al. [27] proposed an ensemble surrogate model in which the weights of the component models are calculated based on cross-validation mean squared errors.The smaller the cross-validation error of a component model, the larger its corresponding weight.Goel et al. [28] proposed a weighted ensemble surrogate model based on predicted residual error sum of squares.Viana et al. [24] and Acar and Rais-Rohani [29] introduced an optimization strategy to optimize the weights of the component surrogate models.Based on Dempster-Shafer theory, Muller et al. [30] developed a weight calculation method to construct an ensemble surrogate model.In their work, an evidence combination rule is adopted to comprehensively consider various error metrics of the component surrogate models.
It is important to point out that the previously mentioned ensemble surrogate model is fixweight ensemble surrogate model(FWESM) across the entire design space.The main shortcoming of FWESM is that they fail to adaptively adjust the weight of component models based on the local error of those component models.In recent years, the studies on adaptive weight ensemble surrogate model(AWESM), which is more related to the method to be developed in this work, have attracted significant attention to many researchers.The objective of employing adaptive weights is to dynamically allocate greater importance or influence to the component models that perform better, while diminishing the effect of weaker models.In this respect, Sanchez et al. [31] proposed an AWESM in which the weight factors of the component surrogate models are calculated based on the prediction variance of the nearby training samples.Zhang et al. [32] proposed a unified ensemble of surrogates (UES) method, which uses a linear combination of a global measure-based weight calculation and a local measure-based weight calculation to define weight factors.Huang et al. [33] proposed a robust ensemble surrogate model based on a hybrid error measurement.Indeed, when building an AWESM, it is crucial to account for the spatial relationship between the sample to be predicted and the entire training dataset.Furthermore, one must consider the fact that interpolation models exhibit better prediction accuracy in proximity to training sample points.Motivated from this finding, the main objective of the paper is to develop a new AWESM capable of providing a higher accuracy of the response predictions with limited samples.This paper presents a novel AWESM that considers several critical factors in its weight calculation.These factors include the spatial relationship between prediction samples and training samples, the assessment of mixture errors in the component surrogate model, and characteristics that resemble interpolation models.Our approach stands out from other methods in the literature in the following ways: Firstly, we calculate the initial weights for component surrogate models based on a combination of global and local errors.These initial weights are then refined according to the distance between the prediction sample and the training samples.Furthermore, we introduce a weight adjustment factor to fine-tune the weights of the fitting models, thereby imbuing our ensemble model with interpolation model characteristics.Secondly, the weight adjustment factor is determined by considering the distance between a prediction sample and its two nearest training samples.Ultimately, the final weights of the component surrogate models are computed by combining the initial weights and the weight adjustment factor.
The remainder of this paper is organized as follows: Firstly, the commonly used surrogate models, encompassing individual surrogate models and ensemble surrogate models, are introduced.Followed by the illustration of the proposed novel adaptive ensemble surrogate model.Then, engineering and numerical studies are carried out to evaluate the performance of the proposed method.The conclusions are provided in the last section.

Ensemble surrogate models
The surrogate model is a type of supervised learning algorithm that is used to approximate the input-output relationship of a computationally expensive numerical model.
In the equation, f e (x) represents the ensemble surrogate model, m is the number of component surrogate models used to construct the ensemble surrogate model, fi ðxÞ is the i th component surrogate model, and w i is a non-negative weight coefficient function associated with the  [28].The weights of the component surrogate models in the PWS model, denoted as w pws i , are calculated as follows: where E i represents generalized mean squared error of the i th component surrogate model, � E is the mean value of E 1 , E 2 , � � �, E m .α and β are control parameters that regulate the influence of the mean of individual component surrogate models and the mean of all component surrogate models on the ensemble surrogate model.Research has shown that setting η = 0.05 and β = −1 provides good stability for the PWS model.The generalized mean square error (GMSE) is obtained through leave-one-out cross-validation (LOOCV) and is defined as follows: where ŷÀ i ðx i Þ represents the prediction of the surrogate model for the sample x i , where the i th training sample is excluded from the model training.
In 2009, Acar and Rais-Rohani proposed a method called BestPRESS to calculate the weights of the component surrogate models.This method optimizes the weight coefficients to minimize the PRESS of the ensemble surrogate model [29].The weights w bpr i of the component surrogate models are obtained by solving the following optimization problem: ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi In 2009, Viana et al. proposed an optimization-based method called Optimal Weighted Surrogate (OWS) model to calculate the weights of the component surrogate models [24].This method minimizes the cross-validation error by considering the covariance matrix formed by the prediction residuals of the component surrogate models.The weights w ows i of the component surrogate models are obtained by solving the following optimization problem: where C is the covariance matrix.C i;j ¼ 1 N cv T i cv j represents the covariance between i th and j th component surrogate models derived from cross-validation errors.

Adaptive-weight ensemble surrogate model
The AWESM, also known as variable weighted ensemble surrogate, refers to a type of ensemble surrogate model where the contribution of component surrogate models to the ensemble model varies at different locations in the design space.The calculation rules for the weights in the AWESM typically take into account the local prediction accuracy of the component surrogate models.
In 2008, Sanchez et al. [31] proposed a weight calculation strategy for the AWESM based on the Prediction Variance (PV), which serves as a local error indicator.In their proposed method, the weight of a component surrogate model at a prediction sample x is given by: where PV i (x) represents the prediction variance of the i th component surrogate model at the prediction sample x.It is computed as: where the term k denotes the number of neighboring points of x, and based on testing, Sanchez et al. set its value to 3. The term ŷÀ l i ðx l Þ represents the leave-one-out prediction of the i th component surrogate model at the neighboring point x l .
In 2021, Huang et al. [33] proposed the ensemble of surrogate based on hybrid error metric (ESHEM) for weight calculation.The weights can be computed using the following equation: where GMSE i represents the generalized mean square error calculated through leave-one-out validation for the i th component surrogate model.When computing the prediction variance of neighboring points PV i (x) using Eq 6, only the neighboring points with Euclidean distance less than the average Euclidean distance between all training samples and the prediction sample are considered.

Materials and methods
More recently, Zhou et al. [34] pointed out that the weight combination strategy based on global error ensures the accuracy of the ensemble surrogate model across the entire design space, while the weight combination strategy based on local error serves as a diversity metric for individual surrogate models at different positions in the design space.In the design space, neighboring points often have similar values.Therefore, when the prediction sample is far from the main region where most sample points are located, its prediction value relies more on the values of adjacent sample points.In other words, in this case, local error measurement plays a dominant role in determining the weight factors.Conversely, global error measurement accounts for a larger proportion when the prediction sample is exactly located in the main region.

Illustration of the proposed method
In this paper, a distance and mixture error based ensemble surrogate model (DMEES) is proposed.The proposed DMEES incorporates the position of the prediction sample, the mixture error metric, and the leaning features of the component surrogate models.Considering the global error, local error, and the spatial relationship between prediction sample and training samples, the initial weights can be calculated using the following equation: where w initial i represents the initial weight of the i th component surrogate model at the prediction sample x. w global i is the weight combination factor that considers global error, which can be calculated using Eq 4. w pv i is the weight factor that considers local error, which can be calculated using Eq 6.When calculating the prediction variance PV i (x) in Eq 6, only the neighboring points whose Euclidean distance to the prediction sample is smaller than the average Euclidean distance among all training samples are considered.
The parameter λ 2 [0, 1] is a distance-related parameter.It controls the influence of global error measurement and local error measurement based on the distance relationship between prediction sample and trainning sanples.λ can be calculated using the following equation:  r, r 1 and r 2 are expressed as following equation.The current weight calculation methods for ensemble surrogate models have not effectively taken into account the learning characteristics of component surrogate models.The commonly used surrogate models can be classified into two types: fitting models and interpolation models.Both types establish approximation functions based on a finite set of training samples to represent the relationships between the input and output variables.Fitting models typically minimize the total error at the training samples through regression equations, while interpolation models construct approximation functions based on spatial distances to ensure that the functions pass through all training samples.
Fig 3 shows the approximations of the Forrester function [35] using SVR, KRG, and RBF based on five given sample points.SVR and PR models are typical fitting models, while KRG and RBF models are typical interpolation models.When using interpolation models, the prediction of a given prediction sample is more influenced by the neighboring points if it is closer to a training sample.Therefore, as interpolation models, KRG and RBF models exhibit higher accuracy near the existing sample points.In this study, we take this factor into consideration when constructing the ensemble surrogate model.We adjust the weights of the fitting models by introducing a fitting model weight adjustment factor, enabling the ensemble surrogate model to possess interpolation model characteristics.The fitting model weight adjustment factor α can be calculated using the following equation, where fitting model weight adjustment factor, the final weight w final i for the DMEES model is given by, If model is a fitting model Based on the final weighting factors of the component surrogate models, the proposed DMEES model predicts the value at the prediction sample x as follows: According to Eq 12, when d 1 = 0, it means the prediction sample coincides with a training sample point, the adjustment factor α for the fitting model weights becomes 0. Consequently, the component surrogate model with fitting model characteristics has a weight of 0 at that prediction sample.Therefore, the DMEES model also possesses the feature of an interpolation model, a zero prediction error at the training sample points.Fig 4 illustrates the process of calculating the weights of the component surrogate models at a given prediction sample x when using DMEES.After determining the ranges of design variables and the system output, the training sample input matrix X can be obtained using the Optimal Latin Hypercube Sampling (OLHS) algorithm.The simulation experiments are then conducted to obtain the training sample output vector y.Based on the available training samples, the weights of each component surrogate model at the prediction sample can be calculated using the following steps:

Flowchart of the proposed method
Step 1: Using the leave-one-out cross-validation (LOO-CV) approach, obtain the outputs y À i corresponding to the i th component surrogate models (KRG, RBF, SVR).
Step 2: With the support of the data y and y À i , calculate the weights of the component surrogate models based on the global error measurement using Eq 4, that is, w global i ¼ w bpr i .
Step 3: Calculate the average Euclidean distance r 1 and the minimum average Euclidean distance of a training sample to other training samples r 2 with the known X. Calculate the average Euclidean distance r between the prediction sample x and the training samples X.Given the distance data r 1 , r 2 , and r, calculate the distance control parameter λ for the weights of the component surrogate models using Eq 10.
Step 4: With the known y, y À i , and r, calculate the weights of the component surrogate model based on the local error using Eq 6, namely, w pv i .When evaluating Eq 6, the value of k represents the number of training samples whose Euclidean distance from the prediction sample is smaller than the average Euclidean distance r 1 between all training samples.
Step 5: Given the weights of the component surrogate models based on global error w global i , the weights of the component surrogate models based on local error w pv i , and the weight control parameter based on distance λ, calculate the initial weights of the component surrogate model using Eq 9.

Performance test of various surrogate models
This section will test the performance of the proposed method through several numerical and engineering examples.The proposed method will be compared with some surrogate models mentioned in this paper.The component surrogate models for constructing the ensemble surrogate model include the KRG model with a first-order polynomial and Gaussian correlation function, the RBF model with a Gaussian kernel function, and the SVR model with a Gaussian kernel function.The RBF and KRG models are implemented using the Modularized Surrogate Model Toolbox, a surrogate modeling MATLAB toolkit developed by Muller et al. [36].The hyperparameters of the surrogate model use the default parameter settings, and an internal tuning algorithm is integrated into the toolbox, ensuring that the approximate model achieves high accuracy in most cases.The SVR model is implemented using the Libsvm toolkit developed by Chih-Jen Lin et al. [37].Since the regularization parameter and kernel bandwidth parameter of the SVR model have a significant impact on the model's generalization performance, a grid search method is used to optimize these parameters during the actual modeling process.
Since relying solely on a single error metric cannot objectively reflect the performance of the model, two commonly used model error metrics, namely the Maximum Absolute Error (MAE) and Root Mean Square Error (RMSE), are used to evaluate the accuracy of the models.MAE and RMSE are formulated as follows: RMSE ¼ ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffiffi 1 N where N represents the number of test samples, y i represents the actual output value of the i th sample, ŷi represents the output value predicted by the surrogate model, and � y represents the mean value of the actual output values for all test samples.RMSE and MAE are served as the global error measurement and local error measurement of a surrogate model, respectively [38,39].The smaller the values of RMSE and MAE, the higher the accuracy of the surrogate model.

Nonlinear function test
Five nonlinear functions are selected to test the approximation capability of the proposed approach.The selected test functions are the Six-hump camel-back function, Ackley function, Hartman function, Extended Rosenbrock function, and Dixon-Price function [40,41].The five test functions are numbered from 1 to 5 in order.Table 2 provides the expressions of the test functions and information about the variable ranges.The parameters related to the Hartman function can be found in [40].

No. Dimensions Expression
VVariable range ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi ffi 1 4 10,10] https://doi.org/10.1371/journal.pone.0293318.t002 Taking into account the influence of sample size on the approximation ability of the surrogate model, we set three different sizes of training samples.Sample sizes for small, medium, and large cases are set as 6n d , 12n d , and 24n d , respectively, in which n d represents the number of independent variables.The test samples are generated using the OLHS method with a sample size of 500.Due to the significant magnitude differences in the outputs of different test functions, in order to facilitate the comparison and analysis of model performance, the test sample outputs and surrogate model outputs are normalized based on the maximum and minimum values of the test sample outputs.Then, the model errors could be calculated based on the normalized data.The robustness of the surrogate model, when trained with different training sample sets, is tested by generating 100 different sets of training samples using the Matlab function 'lhsdesign' with different random seed settings.Table 3 presents the median values According to Table 3, it can be observed that the errors of all surrogate models decrease as the training sample size increases.The performance of component surrogate models varies significantly depending on the specific problem.The SVR model exhibits better accuracy compared to the other two component surrogate models when approximating the Ackley function.However, when approximating the Extended Rosenbrock function and Dixon-Price function, the SVR model performs the worst, while the RBF model demonstrates higher accuracy compared to SVR and KRG.RBF model also shows the highest accuracy when approximating the Dixon-Price function, compared with the SVR and KRG models.The performance of the three individual surrogate models varies with the sample size and the function being approximated when approximating the Six-hump camel-back function and Hartman function.
Compared to all other surrogate models, the proposed DMEES method achieves higher accuracy in most cases.When approximating low-dimensional functions such as the six-hump camelback function and the Ackley function, the DMEES model exhibits smaller median values of MAE and RMSE at small sample sizes.Similarly, when approximating high-dimensional functions such as the Based on the numerical test results, it is evident that the proposed DMESS model outperforms other surrogate models in terms of approximation accuracy.It's important to note that, in some cases, the limitations in the robustness of DMESS should be considered.As mentioned in the introduction, the primary performance criterion for surrogate models is approximation accuracy.However, it is crucial to acknowledge that no surrogate model can achieve optimal performance for all problems under all evaluation criteria.

Engineering case study
To validate the effectiveness of the DMEES model in solving engineering problems, an automotive exhaust pipe design case is used to test the approximation capability.Noise, vibration and harshness (NVH) performance is one of the most important factors directly affecting the driving experience of customers.A well-designed exhaust system plays a pivotal role in enhancing NVH performance [42].In the optimization design of structural parameters for automotive exhaust systems, surrogate models are often used as approximations of computationally expensive finite element simulation models to improve efficiency [43][44][45].In this case, the performance of the surrogate model is studied by approximating the mathematical relationship between the structural parameters of the exhaust system and the peak value of frequency response obtained from finite element analysis.of the vehicle.Since the frequencies of the vehicle at idle (35Hz to 40Hz), parked idle (40Hz to 50Hz), and non-idle (50Hz to 70Hz) are less than 80Hz, and the modal frequencies that cause vibration in the exhaust system will be higher than 20Hz, the Nastran software is used to calculate the reaction forces of the exhaust hangers on the chassis and body structure within the specific frequency range of 20Hz to 80Hz.An example of frequency responses for one design parameter setting is shown in Fig 7(b).The peak values of the reaction forces on the four hangers are labeled as PF 1 , PF 2 , PF 3 , and PF 4 , respectively.
The main design parameters of the exhaust pipe include the hanger locations and the thickness of the pipe.4 provides detailed information about these variables and their ranges.In this case, a surrogate model will be constructed to approximate the mathematical relationship between these design variables and the peak forces PF 1 , PF 2 , PF 3 , and PF 4 .
Herein, 1000 sets of samples are generated by using OLHS based on the design variable ranges provided by Table 4.The simulation is then run to extract the peak forces corresponding to each sample, forming a sample data base with 1000 sets of samples.Randomly selected training samples are used to train the surrogate model, while the remaining samples are used as test samples to calculate the error metrics of the surrogate model.The number of training samples is set to 6n d for the small sample case, 12n d for the medium sample case, and 24n d for the large sample case, where n d is the number of independent variables.To avoid the contingency of test results, the test is repeated 100 times by using different random seeds setting.
Figs 8-11 provide the boxplots of the MAE and RMSE distributions of the surrogate models when approximating PF 1 , PF 2 , PF 3 , and PF 4 under random testing.From these four figures, a decrease in errors for all surrogate models could be be observed as the training sample size increases.This is because with more training samples, the surrogate models can learn more, and their accuracy becomes higher.According to the distribution of the RMSE, the DMEES model exhibits relatively poorer robustness compared to other models from the perspective of global accuracy.However, when examining the distribution of the MAE values, considering the variations in the approximation targets and sample sizes, the robutness of the DMEES model is similar to that of other models in only a few cases, while in the majority of cases, the DMEES model demonstrates better robustness than the other models.Comparing accuracy and robustness, DMESS clearly outperforms other surrogate models.
From Figs 8-11, it can also be observed that when approximating PF 1 , PF 2 , PF 3 , and PF 4 , both in terms of RMSE and MAE, the 75 th percentile of DMEES model is lower than the

Conclusion
This paper presents a novel adaptive-weight ensemble surrogate model based on the principles of distance and mixture error.The new ensemble surrogate model takes into account the  2. The proposed DMEES model demonstrates strong robustness in specific scenarios but shows reduced robustness compared to alternative surrogate models in other cases, as measured by local accuracy.This conclusion also applies to model robustness when measured by global accuracy.
3. While no surrogate model can be universally applicable to all problems across every evaluation criteria, the proposed DMEES model consistently exhibits superior predictive accuracy across a wide range of performance evaluation metrics.These results underscore the significant advantages of our approach and its promising potential for optimizing surrogate-based structural design.

2
dr ð10Þ r represents the average Euclidean distance between the prediction sample x and the training samples, which reflects the distance between the sample to be predicted and the main area where most samples are located.r 1 is the average Euclidean distance of training samples, indicating the uniformity of the distribution of training samples.r 2 is the minimum average Euclidean distance of a training sample to other training samples, indicating the density of the training samples.For a given training set S = (x 1 , y 1 ), (x 2 , y 2 ), (x 3 , y 3 ), . .., (x n , y n ), where

Fig 1
Fig 1 illustrates the average Euclidean distance (r) between the prediction sample, x and a given set of training samples.The black dots in the figure are training samples, and the contour

Fig 3 .
Fig 3.The approximation of Forrester functions by interpolation and fitting models.The KRG model uses firstorder polynomials and Gaussian correlation functions, the RBF model uses Gaussian kernel functions, and the SVR model uses Gaussian kernel functions.https://doi.org/10.1371/journal.pone.0293318.g003

Step 6 :
Calculate the Euclidean distances d 1 and d 2 between the prediction sample x and the nearest and second nearest samples in the training samples X. Compute the weight adjustment factor for the fitting models α based on Eq 12, and determine the final weights of the component surrogate models w final i based on Eq 13.

Fig 4 .
Fig 4. Process for calculating the weight factor of the DMEES model at a prediction sample.https://doi.org/10.1371/journal.pone.0293318.g004 Hartman function, Extended Rosenbrock function, and Dixon-Price function, the DMEES model consistently demonstrates small model errors regardless of the sample size.Particularly, when approximating the Extended Rosenbrock function and Dixon-Price function, regardless of the size of the training sample used, the median values for the error measurements of the DMEES model are significantly lower than those of other surrogate models.The median value of MAE for the DMEES model is close to half of the median value of MAE for other models, highlighting the significant accuracy advantage of the DMEES model.The boxplots in Figs 5 and 6 respectively present the error distributions of surrogate models when approximating the Extended Rosenbrock function and Dixon-Price function.These boxplots are used to illustrate the error distribution characteristics of the surrogate models.The horizontal red line in the middle of the blue box in the graph represents the median value of approximation error of a surrogate model.The two horizontal blue lines of the box, from bottom to top, represent the 25 th percentile and 75 th percentile of the approximation error of a surrogate model, respectively.The lines extending above and below the box (whiskers) indicate the overall distribution of the approximation error of a surrogate model.Each red cross outside the range of the error distribution represents an outlier.The black horizontal lines at the ends of the whiskers represent the upper and lower values of the error, excluding outliers.As shown in Fig 5, the MAE distribution of the DMEES model is more scattered compared to the other models.The wide distribution of MAE values suggests that, in terms of local accuracy, the DMESS model is more sensitive to training sample variations when fitting the Extended Rosenbrock functions compared to other models.The higher the sensitivity of a surrogate model to variations in the training sample, the worse its robustness.However, the distribution of RMSE values in Fig 5 indicates that the DMEES model has a similar level of robustness as the other surrogate models (except for SVR) from the perspective of global accuracy.As shown in Fig 6, the widespread distribution of MAE and RMSE indicates that, both from the perspective of global accuracy and local accuracy, the DMESS model is more sensitive to variations in the training sample than other models when fitting Dixon-Price function.It also can be observed from the Figs 5 and 6 that when approximating the Extended Rosenbrock function and Dixon-Price function, regardless of the sample size, the 75 th percentile line of the DMEES model's error is lower than the minimum error value (lower edge of the error box) of the other models.This implies that 100 model performance tests, the DMEES model achieves the minimum MAE and RSME in at least 75 of the tests.Therefore, it is evident that the DMEES model exhibits a significant accuracy advantage when approximating the Extended Rosenbrock function and the Dixon-Price function.

Fig 5 .
Fig 5. Boxplots of approximation errors for surrogate models when fitting the Extended Rosenbrock function.100 tests of surrogate model are performed for each sample size.The training samples of each test are generated randomly.(a) RMSE and MAE distributions under small training samples.(b) RMSE and MAE distributions under medium training samples.(c) RMSE and MAE distributions under large training samples.https://doi.org/10.1371/journal.pone.0293318.g005

Fig 6 .
Fig 6.Boxplots of approximation errors for surrogate models when fitting the Dixon-Price function.100 tests of surrogate model are performed for each sample size.The training samples of each test are generated randomly.(a) RMSE and MAE distributions under small training samples.(b) RMSE and MAE distributions under medium training samples.(c) RMSE and MAE distributions under large training samples.https://doi.org/10.1371/journal.pone.0293318.g006 Fig 7 shows the finite element model of the automotive exhaust pipe and the frequency response of the reaction forces at various hanger positions.As shown in Fig 7 there are four hangers in the exhaust system, including the transmission bracket hanger, the muffler inlet hanger, the muffler outlet hanger, and the tailpipe hanger.Excitation from the engine causes the exhaust hangers to exert reaction forces on the chassis and body structure, which play an important role in evaluating the overall NVH performance

Fig 7 .
Fig 7. Frequency response analysis model and hanger force-frequency curve of automotive exhaust pipe system.https://doi.org/10.1371/journal.pone.0293318.g007 Fig 7(a) indicates 11 variables that include the hanger locations and the thickness of the pipe.Table /doi.org/10.1371/journal.pone.0293318.t004minimum error of other models.This indicates that in at least 75 out of 100 tests, the DMEES model has the smallest error.In a few cases, such as the MAE boxplots in Fig 8(a) and 8(b), and the RMSE boxplot in Fig 9(c), the upper edge of the DMEES model's boxplot is lower than the lower edge of the boxplot corresponding to other surrogate models.This suggests that in each testing instance, the DMEES model consistently achieves the minimum error value.Therefore, it can be concluded that the DMEES model has a significant accuracy advantage.

Fig 8 .
Fig 8. Boxplots of approximation errors for surrogate models when fitting PF 1 .100 tests of surrogate model are performed for each sample size.The training samples of each test are generated randomly.(a) RMSE and MAE distributions under small training samples.(b) RMSE and MAE distributions under medium training samples.(c) RMSE and MAE distributions under large training samples.https://doi.org/10.1371/journal.pone.0293318.g008

Fig 9 .
Fig 9. Boxplots of approximation errors for surrogate models when fitting PF 2 .100 tests of surrogate model are performed for each sample size.The training samples of each test are generated randomly.(a) RMSE and MAE distributions under small training samples.(b) RMSE and MAE distributions under medium training samples.(c) RMSE and MAE distributions under large training samples.https://doi.org/10.1371/journal.pone.0293318.g009

Fig 10 .
Fig 10.Boxplots of approximation errors for surrogate models when fitting PF 3 .100 tests of surrogate model are performed for each sample size.The training samples of each test are generated randomly.(a) RMSE and MAE distributions under small training samples.(b) RMSE and MAE distributions under medium training samples.(c) RMSE and MAE distributions under large training samples.https://doi.org/10.1371/journal.pone.0293318.g010 The surrogate model can replace time-consuming simulation calculations during the optimization process, allowing for the completion of computationally intensive global optimization within a short period.Commonly used individual surrogate models in engineering structure design includes Polynomial Regression model (PR), Support Vector Regression model (SVR), Kriging/Gaussian Process model (KRG/GP) and Radical Basis Function interpolation model (RBF).The characteristic of these models are summarized in Table 1.Different surrogate models can capture different features and patterns in the data, making them suitable for various engineering problems.Ensemble surrogate model can leverage the advantages of multiple individual surrogate models to enhance the overall forecasting performance, mitigate the risk of over-fitting associated with a single model, and deliver more consistent forecasting results.An ensemble surrogate model, also known as a mixture surrogate model, hybrid surrogate model or integrated surrogate model, is generally composed of several trained individual surrogate models combined through weighted aggregation.It has the following form:

Table 3 . Median values of approximation error for surrogate models under different training samples.
the error measurements of the surrogate models when trained with different training sample sets.The gray background in the table represents individual surrogate models, while the white background represents ensemble surrogate models.The best-performing individual surrogate model and the best-performing ensemble surrogate model are highlighted in bold in the table. https://doi.org/10.1371/journal.pone.0293318.t003for